Detecting errors in linked data using ontology learning and outlier detection
نویسنده
چکیده
Linked Data is one of the most successful implementations of the Semantic Web idea. This is demonstrated by the large amount of data available in repositories constituting the Linked Open Data cloud and being linked to each other. Many of these datasets are not created manually but are extracted automatically from existing datasets. Thus, extraction errors, which a human would easily recognize, might go unnoticed and could hence considerably diminish the usability of Linked Data. The large amount of data renders manual detection of such errors unrealistic and makes automatic approaches for detecting errors desirable. To tackle this need, this thesis focuses on error detection approaches on the logical level and on the level of numerical data. In addition, the presented methods operate solely on the Linked Data dataset without a requirement for additional external data. The first two parts of this work deal with the detection of logical errors in Linked Data. It is argued that an upstream formalization of the knowledge, which is required for the error detection, into ontologies and then applying it in a separate step has several advantages over approaches that skip the formalization step. Consequently, the first part introduces inductive approaches for learning highly expressive ontologies from existing instance data as a basis for detecting logical errors. The proposed and evaluated techniques allow to learn class disjointness axioms as well as several property-centric axiom types from instance data. The second part of this thesis operates on the ontologies learned by the approaches proposed in the previous part. First, their quality is improved by detecting errors possibly introduced by the automatic learning process. For this purpose, a pattern-based approach for finding the root causes of ontology errors that is tailored to the specifics of the learned ontologies is proposed and then used in the context of ontology debugging approaches. To conclude the logical error detection, the usage of learned ontologies for finding erroneous statements in Linked Data is evaluated in the final chapter of the second part. This is done by applying a pattern-based error detection approach that employs the learned ontologies to the DBpedia dataset and then manually evaluating the results which finally shows the adequacy of learned ontologies for logical error detection. The final part of this thesis complements the previously shown logical error detection with an approach to detect data-level errors in numerical values. The presented method applies outlier detection techniques to the datatype property values to find potentially erroneous ones whereby the result and performance of the detection step is improved by the introduction of additional preprocessing steps. Furthermore, a subsequent cross-checking step is proposed which allows to handle the outlier detection imminent problem of natural outliers. In summary, this work introduces a number of approaches that allow to detect errors in Linked Data without a requirement for additional, external data. The generated lists of potentially erroneous facts can be a first indication for errors and the intermediate step of learning ontologies makes the full workflow even more suited for being used in a scenario which includes human interaction.
منابع مشابه
Outlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means
One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...
متن کاملOutlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملDetecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes
With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...
متن کاملDetecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection
Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior. In this work, we instead propose an approach which combines the outcomes of two independent outlier detection runs to get a more reliable result and to also prevent problems arising from natural outliers which are exceptional values in the dataset...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015